Best Arm Identification in Restless Markov Multi-Armed Bandits
نویسندگان
چکیده
We study the problem of identifying best arm in a multi-armed bandit environment when each is time-homogeneous and ergodic discrete-time Markov process on common, finite state space. The evolution governed by arm’s transition probability matrix (TPM). A decision entity that knows set TPMs but not exact mapping to arms, wishes find index as quickly possible, subject an upper bound error probability. selects one at time sequentially, all unselected arms continue undergo (restless arms). For this problem, we derive first-known instance-dependent asymptotic lower growth rate expected required arm, where asymptotics vanishes. Further, propose sequential policy that, for input parameter $R$ , forcibly has been selected consecutive instants. show achieves depends monotonically non-increasing notation="LaTeX">$R\to \infty $ . question whether, general, limiting value matches with bound, remains open. identify special case which bounds match. Prior works identification have dealt (a) independent identically distributed observations from (b) rested whereas our work deals more difficult setting restless arms.
منابع مشابه
Best Arm Identification in Multi-Armed Bandits
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean reward of the ultimately chosen arm. We propose a highly exploring UCB policy and a new algorithm based on successive rejects. We show that these algorithms are essentially optimal since their regre...
متن کاملBest arm identification in multi-armed bandits with delayed feedback
We propose a generalization of the best arm identification problem in stochastic multiarmed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework ...
متن کاملPractical Algorithms for Best-K Identification in Multi-Armed Bandits
In the Best-K identification problem (Best-K-Arm), we are given N stochastic bandit arms with unknown reward distributions. Our goal is to identify the K arms with the largest means with high confidence, by drawing samples from the arms adaptively. This problem is motivated by various practical applications and has attracted considerable attention in the past decade. In this paper, we propose n...
متن کاملBest-Arm Identification in Linear Bandits
We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter θ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In parti...
متن کاملMulti - armed restless bandits , index policies , and dynamic priority allocation
This paper presents a brief introduction to the emerging research field of multi-armed restless bandits (MARBs), which substantially extend the modeling power of classic multi-armed bandits. MARBs are Markov decision process models for optimal dynamic priority allocation to a collection of stochastic binary-action (active/passive) projects evolving over time. Interest in MARBs has grown steadil...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Information Theory
سال: 2023
ISSN: ['0018-9448', '1557-9654']
DOI: https://doi.org/10.1109/tit.2022.3230939